3+4[1] 7
1 / 200 *30[1] 0.15
sin(pi / 2)[1] 1
sqrt(16)[1] 4
Session 3: Data exploration part 2
November 15, 2023
select and mutatesize aesthetic?The following closely mirrors R4DS chapter 3
Try out a few calculations in your RStudio console!
<-
object_name <- value
b + c in your console once you have created themgapminder examples that you will be working with as an exercise later)this, press Tab and see what happensthis_is_too and then try code completion againthis_is_a_really_long_name to 3.5this and pressing Cmd/Ctrl + up_arrowthis_is_a_really_long_namer_rock or R_rocks or if you use the American spelling of colorggplot, filter etc.)function_name(arg1 = val1, arg2 = val2, …)
se and hit Tabseq() by typing q or using the arrow keys?seqx <- "hello+ shown in the console is the “continuation character”, telling you that something is missing" or a )The Environment panel shows all of the objects that you have created and their values. It is located in the upper-right pane of RStudio by default, but if you have changed your pane layout to match mine, it will be in the bottom-left pane.
… and consequences for workflow
A study is replicable if the same results are obtained when the study is repeated.
An analysis is reproducible if the same results can be obtained from the data
Complete exercises 1-3 in section 3.5 of R4DS
geom_point() for scatterplotsgeom_line() for line graphsgeom_col() for column or bar graphsgeom_histogram() for histogramsgeom_boxplot() for boxplotsTip
The from Data to Viz website provides an excellent overview of which type of plot to use depending on your data. The R graph gallery is also very useful!
Let’s start by looking at flipper length and go through our checklist.
Is flipper length predictive of body mass?
Is flipper length predictive of body mass?
Is species predictive of body mass?
No data manipulation needed, body_mass_g (outcome) -> y-axis, species (predictor) -> x-axis. Try using geom_point() again …
The output is not quite what we would hope for …
Is species predictive of body mass?
For this type of question, geom_jitter() gives a better result.
Much better!
Is species predictive of mean body mass?
Here, we can use geom_col(), but we need to do some summarising first.
Don’t forget to adjust your aesthetics to reflect the new variable names!
geom_line() is a good choice for this type of question: two numerical variables but only one y-value for each value of xHow many penguins (of each species) were sighted across the different years?
We will see how to fix the axis numbers later.
How many penguins of each species were sighted across the different years?
Note the use of colour to introduce groups and also how we can easily add a subtitle.
What is the distribution of bill depth?
We can use a histogram to answer this. Note that histograms only require an x aesthetic; the y-axis values are calculated automatically through the choice of geom. We also need to adapt the choice of binwith to suit the variable under consideration.
What is the distribution of bill depth by species?
Pair a histogram with facets.
Note the use of facet_wrap(~variable) to split the plot into facets according to the variable.
What is the distribution of bill depth by species?
Alternatively, we can use a boxplot.
What is the distribution of bill depth by species?
Note that colour is used for points, lines and outlines, while fill is used to fill shapes.
What is the distribution of bill depth by species and island?
You can use fill to introduce an additional set of groups. Note also how you can change the label for a particular aesthetic using labs.
install.packages("gapminder") in the console (don’t put this into your qmd document)# A tibble: 6 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
The Posit data visualisation cheatsheet includes a lot of useful information, e.g. which geoms to use for which type of question
filter) and plot types (e.g. scatterplots) that we have looked at so farNote: This document will also serve as a valuable reference for you going forward.